Robust Techniques for Organizing and Retrieving Spoken Documents

نویسنده

  • James Allan
چکیده

Information retrieval tasks such as document retrieval and topic detection and tracking (TDT) show little degradation when applied to speech recognizer output. We claim that the robustness of the process is because of inherent redundancy in the problem: not only are words repeated, but semantically related words also provide support. We show how document and query expansion can enhance that redundancy and make document retrieval robust to speech recognition errors. We show that the same effect is true for TDT’s tracking task, but that recognizer errors are more of an issue for new event and story link detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords

This paper describes a Japanese spoken document retrieval system that is robust for Out-of-Vocabulary (OOV) words. A standard approach to spoken document retrieval is to automatically transcribe spoken documents into word sequences, which can be directly matched against queries. In this approach, the documents including OOV words and words misrecognized as other words cannot be retrieved. To av...

متن کامل

Spoken Content-Based Audio Navigation (SCAN)

We describe SCAN, a system for retrieving and browsing speech documents from large audio corpora that uses new information retrieval and speech processing techniques to create easily navigable presentations of documents relevant to a user query. Experiments show that the new interface is more effective than simple speechalone interfaces.

متن کامل

The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query

This paper describes some key concepts developed and used in the design of a spoken-query based information retrieval system developed at the Mitsubishi Electric Research Labs (MERL). Innovations in the system include automatic inclusion of signature terms of documents in the recognizer’s vocabulary, the use of uncertainty vectors to represent spoken queries, and a method of indexing that accom...

متن کامل

Open-vocabulary spoken-document retrieval based on query expansion using related web documents

This paper proposes a new method for open-vocabulary spoken-document retrieval based on query expansion using related Web documents. A large vocabulary continuous speech recognition (LVCSR) system first transcribes spoken documents into word sequences, which are then segmented into semantically cohesive units (i.e., stories) using a text segmentation technique. Given a text query word, Web docu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2003  شماره 

صفحات  -

تاریخ انتشار 2003